355 research outputs found
Half-Duplex Relaying for the Multiuser Channel
This work focuses on studying the half-duplex (HD) relaying in the Multiple
Access Relay Channel (MARC) and the Compound Multiple Access Channel with a
Relay (cMACr). A generalized Quantize-and-Forward (GQF) has been proposed to
establish the achievable rate regions. Such scheme is developed based on the
variation of the Quantize-and-Forward (QF) scheme and single block with two
slots coding structure. The results in this paper can also be considered as a
significant extension of the achievable rate region of Half-Duplex Relay
Channel (HDRC). Furthermore, the rate regions based on GQF scheme is extended
to the Gaussian channel case. The scheme performance is shown through some
numerical examples.Comment: 7 pages, 4 figures, conference pape
Automatic tagging and geotagging in video collections and communities
Automatically generated tags and geotags hold great promise
to improve access to video collections and online communi-
ties. We overview three tasks offered in the MediaEval 2010
benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features
Privacy-preserving Representation Learning for Speech Understanding
Existing privacy-preserving speech representation learning methods target a
single application domain. In this paper, we present a novel framework to
anonymize utterance-level speech embeddings generated by pre-trained encoders
and show its effectiveness for a range of speech classification tasks.
Specifically, given the representations from a pre-trained encoder, we train a
Transformer to estimate the representations for the same utterances spoken by
other speakers. During inference, the extracted representations can be
converted into different identities to preserve privacy. We compare the results
with the voice anonymization baselines from the VoicePrivacy 2022 challenge. We
evaluate our framework on speaker identification for privacy and emotion
recognition, depression classification, and intent classification for utility.
Our method outperforms the baselines on privacy and utility in paralinguistic
tasks and achieves comparable performance for intent classification.Comment: INTERSPEECH 202
A Speech Representation Anonymization Framework via Selective Noise Perturbation
Privacy and security are major concerns when communicating speech signals to
cloud services such as automatic speech recognition (ASR) and speech emotion
recognition (SER). Existing solutions for speech anonymization mainly focus on
voice conversion or voice modification to convert a raw utterance into another
one with similar content but different, or no, identity-related information.
However, an alternative approach to share speech data under the form of
privacy-preserving representation has been largely under-explored. In this
paper, we propose a speech anonymization framework that achieves privacy via
noise perturbation to a selected subset of the high-utility representations
extracted using a pre-trained speech encoder. The subset is chosen with a
Transformer-based privacy-risk saliency estimator. We validate our framework on
four tasks, namely, Automatic Speaker Verification (ASV), ASR, SER and Intent
Classification (IC) for privacy and utility assessment. Experimental results
show that our approach is able to achieve a competitive, or even better,
utility compared to the speech anonymization baselines from the
VoicePrivacy2022 Challenges, providing the same level of privacy. Moreover, the
easily-controlled amount of perturbation allows our framework to have a
flexible range of privacy-utility trade-offs without re-training any component
- …